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Behavioral Expectation Scales developed by Smith and 


Kendall were evaluated. Results indicated slight interrater 
reliability between Head Nurses and Supervisors, moderate dependence 
among five performance dimensions, and correlation between two scales 
and tenure. Results are discussed in terms of procedural problems, 
critical incident problems, and perspective of raters. (Author) 
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Recently, several investigators (Cesptell, Dunnette, Lawler, & Weick, 
1970; Fogli, Hulin, & Blood, 1971; Lendy & Gudon, 1970; Mase, 1965) have usec 
or recomended vaing behaviorally anchored rating scales which are corstructed 
using a procedure reported by Smith and Kendall (1963). Briefly, the Smith ard 
Kendall procedure is an iterative tachnique which involves the development of 
dimensicna, scales, end items of performance criteria by independent groups. 
Thera are several adventagee of bel viloral expactation scales (BES): {1} groups 
with work experfences gimilar to th:.° who eventually use the sceles parcicipste 
dn the conatruction cf the scales; —°) banavioral incidents are uged aa cacher 
points on cach scale; (3) the termin: logy uead on the job is retained tn the 
anchora; (4) relatively independent ~cales with high scale relisbilities are 
obtained; and (5) in aetuel uve, ratere decumant thelr zatings with specitic 
incidents. 

Sutth and Kendall (295%) used the above procedure to develop scaies to be 
vead for evaluation of steff nurses. Four groups of head nuzees, from different 
hospitels, participated in the study. Final results ylelded five dimension? on 
which staff nurses eould be svaluatcd: iuowledge and judgment, conectentious- 
nese, ebiJ1 tn humon relations, osgenizeticnal ability, and observationsl stoliity. 
Seale reliskilities of tho feoms wer: .97 or hatter. However, very littie is 
known shout the construct validity of the scales. Thus, the purpose of the 
present study was to evaluate BES with respect to interratcs reliability, the 


independence of five cimensions, the selation of the scales to other criteria, 
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and the practical usefulness of the scales. 


Kethed 

Subjects 

The reasearch site vas 4 public, non-profit, 165-bed hospital in Nertheyn 
California. Two nursing supervisery Levels, Head Nurses (HN) (8 = 9) and Super- 
visoza (SUP) (N= 5}, participated in che performance evaluation of 93 stat 
registered vurses (RN). The span of control fer HNe and SUPs ranged from four 
to seventeen, with e given HN and SUP pair not necegearily having; a common set 
of RNs to evaluate. For each dimension correlaticas were abtained between two 
evaluation diatributions, one provided by HNs and one provided by SUPe, for the 
semple of RNs who wore common to both rater sewpies. Depending ca caters in- 
volved avd dimension evaluated, an. vecauee of miseing data and incorrect use 


of BES, final dntsrcorrelation seug)2 sises varied from 71 to 92. 


tha BES developed by Smith end Xond2ii (1963) was uasd fer performance 
evaluations. The apprateni procedure required that several fucidenta be wuced 
for each dimenston, that ench incident te assigned a valve with the behavioral 
anchors aztying as a frame of reference, and finally, ther the average of the 
valuae of tha incidents for a given dimenaion be used as a summary velue (Tate, 
1964), 

Rech HN aad SUP was given a bilaf training sessicn on the use of GES. Each 
rater wus asked te vecord Lucidents om each dimensica for each aurse for whe 
ehe was responsible. The Incidents wero to cefleect past performance of the RNo 
or patformsnce as observad in a foliewing two-month period. A restriction of 
at leact one incident and (so that time spent on evaluations would not become 
excessive) ro more than five incidents for each dimeneion was imposed. (The 


vaters were given tine off from regular duties to work on the BES evaluations.) 
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Tenure and abzentee data ware collected and correlated with perforuance 
avelustiors. Absesatee infornation was recorded dichotomcuuly, en KN being eithay 
a chronic absentee or a non~absentee. Absentee information wae obtained fron a 
list ti hospitel aduinietration hed prepare’, while the present research was in 
cperation, because of coacern about. {increasing chronic abdsenteeten. 

In addition, efy moathe eftex the ratings were completed, all incidents 
contxibuted by the rstere and the original anchor items on the BES were randomised 
snd presented to five judgea (nurses) for reevaluation. The five judgea wore to 
exemixe anch incident, easign {t to one cf the five HES dimensions, avd also to 
assign to 4* a value from 0.00 to 2.90, which wan the scale range Smith sad 
Kendall (1963) used. This procedure provided an exeudnaticn of the eppropriate~ 


ness of the items for a cimonsion evi the interrater reliability of item va"ives. 


Results 

Kkesults ara ehown im Table 1. The correlation diagenel in parentherss, 
representing interrater agreement, provided avidence that HNu and SUPs ware in 
- significant agreement ia vatiag Kis on aki give dimensions. Agreement between 
the two rater canplee vao test for humm relations skill aad poorest for con- 
echaniiousneus ead ohservatioual abliity. 

The @agree of independence of the dimensicne for HNes and SUPe is ehown in 
the eolid triangles. Yox both rater samples the intercorreleations between dimen- 
sions vere ofgnifieunt and high, The anount of dependence ranged from .38 to 
-62 ex ities and from .49 to .62 for © ‘iPs. 

table 1 aleo shows che celatic.chip of absentecism and of tenure to cach 
of the performance dimensions for both groups of raters. Tenure was significantly 
eorrelatead srith kmowledge and judgmecat and with consclentiousness es appraised 
by HMe. None of the correlations between absenteeism or tenure end the performance 
cimensions as appraised by SUPs were significant. The relationship between tenur? 


and absontecim waa .37 (N = 95, p< .01). 
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Discussion ‘ 

The major purpose of the present study vas to evaluate the Smith and Kendall 
(1963) SES in a field situution in texws of conetruct validity. As a result, 
interesting probliemm pertaining to BES in particular and to performance appreisal 
in general surfaced. 

Tha wcderata degree of interrater reliabliity caused usa to reassess our 
expectatfon that the two supervisory levels would egree on epprsisals. In escence, 
BES scalee were developed by supervisory personnel, retained nursing terminolcey, 
eud provided naoningful dimensions. However, the two present supervisory groups 
were not aquivalert to each other in their opportunity to evaluate staff nureces. 
HNs supervine warde and heve direct contact with RNs, thereby placing then ia 
an adaquste poeition to observe end evaluate the RNs. On the other hand, SU?Ps 
are involved Jn more admtnistrative {:nctions, coordinating the activities in 
verious areas of the hospital. As a reault, SUPe have less opportunity to ob- 
serve and evaluaze Rus. 

Hence, there are several reasong fcr the moderate emount of interrater re- 
liabilicy obtained in tha present study. First, SUPe and ANe are two eupervisory 
levela, each with differant funetions. Second, SUPs and ENs do not have equi- 
valent opportunity to observe and evaluate Ris. Third, an hypothesia that SU?a 
and Ne empbastze similar bebaviorz hx not been tested. If their different 
paxspectivee alco cauge the two grouns to expect or value different behaviors, 
thes the uederate interrater reliability is not surprising. In fect, geogzaphical, 
desogrephic, ox organizetional differences between the nurses in this study sad 
these in the Soith snd Kendall (1963) study may be related to variation in per-: 
spective and expectations ani, consequently, may account for the present xeovlts. 

With seapect to independence of dimansiong, Smith and Kendall (1953) found 
fiva independent traits, independent in the sense that items and dimensions 
retained were weaningful and distinguishable for the original research samples. 
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The generalizability of the Smith and Kendall items is confirmed by the present 


judges, who exhibited high dimension and value sagreewent. That is, the five 


judges correctly categorized the original BES inems. (ver cent correct categori- 


gation for the judges ranged from 76Z to 93%.) 


However, the high interdimension correlations can be explained by considerable 


within-rater halo and method biag. The lacidents provided by the present raters 


were vague, aud there was only moderate agreement between judges as to their 


appropriateness for the indicated dimensions. (Per cent correct categorization 


for the judgee ranged from 53% to 59%.) 


Thue, the problem becomes one of utility of the specific BES; are the scales 


apprepriete for one or more groups of raters? Several attempts were undertaken 


to investigate thie problem. First, moderate interrater reliability but high 


dimeneion intercorreletion, indicating poor construct validity, ordinerily would 


limit the usefulness of new ecales. However, given the extensive development 
procedura snd the fact that we are desling with perceptions of behavior, addi- 
tional relationships were exemined. The significant correletions between tenure 
and knowledge and judgmant, acd between tenure and conscLentiousness, Indicated 
that the relatively subjective performance BES criteria have some meaningful 
variance in common with objective criteria. Since the ENe' performance evalue- 
tions correlated with tenure, support is obtained for ENe se the appropriate 
raters, Second, support for HNe es appropriate raters pertaias to amount of 
contcct with RNs. Organisational structure preseribes that HNe have wore centact 
with Bie, end consequently have more opportunity to chserve behavior. Third, 
examinai:ion of the specific ii::cidents cited and theix accompanying values indi- 
cated that, in general, the icems for a given scale provided by His were more 
consistent, whereas the items for a given scale by SUPs were more variable, and 
often the two items provided by SUPs were at opposite poles. Fourth, the inter- 


correlations between dimensions were less for HNe than for SU¥s. 
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As a result, the authors conclude that the scales are wore appropriate for 
ENs than for SUPa in the present study. The authors suggeat examining the pro- 
cedure as used in the present study as opposed to the one suggested by Smith 
and Kandall (1963). ‘The present raters provided one to five incidents, which 
were sudsequently averaged. In contrast, Swith end Kendal. suggested that the 
rater provide @ summary rating and then document the rating. The latter pre- 
cedure eliminates variance within ratings for a given rater on e given scale 
applied to a given retee. HKewever, the question arises ag to whether making a 
owmary rating will dictate the typa of documentations and, in effect, wiil 
provide an accurate description of the ratee's behavior. Both of the above 
procedures should be tried end the results compared. 

A final recommendation is that considerable training be given to the racers 
in the use of the ecales, particularly ia an attempt to get support from the 
raters. The pregent raters were given approximately a helf-hour traluing 623~ 
sion in the use of BES. Durivg the two-month observation and rating period 
severai raters complained that the procedure wes toc tine-consueing and diffi-~ 
cult. Fowaver, when the research wre completed, the raters vere debxriefed and 
given oxemplas of “goed” contributed incidents and “poor” contributed incidents. 
At that time somo ratere indicated iat they had not knows exactly what kind 
of incidents could be considered “geod,” which evggested that if they had received 
additional and wore comprehsasive treining, results would have more accurately 
reflected tha actuai situation. In addition, most raters indicated that they 
peeferred the BES format to the Likert-type cating acale which they had used da 


the past. 
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